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Abstract 

In statistical thermodynamics the 2nd law is properly spelled out in terms 
of conditioned probabilities. As such it makes the statement, that 'entropy 
increases with time' without preferring a time direction. In this paper we 
try to explain this statement — which is well known since the time of the 
Ehrenfests — in some detail within a systematic Bayesian approach. 

1 Introduction 

First, we wish to make the statement in the abstract more precise. To this end, we 
think of an idealized system, whose state may only change at sharp, discrete times. 
This allows us to speak unambiguously about next and previous points in time. 
Now we make the following 

Assumption. A time tt the system is in a state z(tt) of non-maximal entropy. The 
statistical 2nd law now makes the following statement about conditioned probabil- 
ities (the condition will not be repeated): 

Statement 1. The probability, that the state z{t() will develop in the future to a 
state z(t| + i ) of larger entropy, is larger than the probability for a development into 
a state of smaller entropy. 



1 



Statement 2. The probability, that the state z(tt) has developed from the past from 
a state z(tj_i ) of larger entropy, is larger than the probability of a development 
from a state of smaller entropy. 

Consequence 1. The likely increase of entropy in the future state development 
z(ti) i— ) z(tt + i) does not imply a likely decrease for the (fictitious) past develop- 
ment z(t|) i — ) z(ti_i ), but also a likely increase. 

Consequence 2. The most likely development z(tj,_i ) i— > z(ti) is that of decreas- 
ing entropy. Somewhat ironically one may say, that it is more likely for the state 
z(ti) to come about through the improbable development from a more probable 
state z(t-i_i ), than through the probable development from an improbable state. 

To properly understand the last consequence, recall that our condition is placed 
on z(t|), that is at time t|. For z(t^) i— > z(t| + i) this means a retarded or initial 
condition, for z(tt_i) i— > z(tt), however, an advanced ox final condition. It is this 
change of condition which makes this behaviour of entropy possible. 

Consequence 3. The mere (likely) increase of entropy does not provide an orien- 
tation of time. It does not serve to define a 'thermodynamic arrow of time'. Rather, 
an orientation is usually given by considering a finite time-interval and imposing 
a low-entropy condition at one of the two ends of the interval. Without further 
structural elements which would allow to distinguish the two ends, the apparently 
existing two possibilities to do so are, in fact, identical. An apparent distinction is 
sometimes introduced by stating, that the condition at one end is to be understood 
as initial. But at this level this merely defines initial to be used for that end, where 
the condition is placed. 

Many notions any types of reasoning in statistical thermodynamics can be well 
illustrated in terms of the Ehrenfest's urn-model, which is to be regarded as a toy 
model of a thermodynamic system, and whose detailed description we present be- 
low. In particular, this holds true for the consequences listed above, for whose 
partial illustration this model was designed by Paul and Tatiana Ehrenfest (0 ; see 
also [2|-[5|. Our presentation will be more detailed than theirs. Nothing of what we 
say will be essentially new. Besides being more detailed, we try to take a Bayesian 
approach. In what follows it will be important to alway relate to the general for- 
malism of statistical thermodynamics in order to not provoke 'easy' or 'intuitive' 
but uncontrolled reasonings. There is always a certain danger for this to happen 
in the context of simple models. The Appendix collects some elementary notions 
which are not explained in the main text. These will be relevant in the following 
section. 
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2 The Urn-Model 



Think of two urns, Uo and Ui, among which one distributes N numbered balls. 
For exact equipartition to be possible, we assume N to be even. A microstate is 
given by the numbers of balls contained in Ui (the complementary set of numbers 
then label the balls in Uo). To formalize this we associate a two-valued quantity 
Xi 6 {0, 1}, i G {1 , . . . , N}, to each ball, where x\ = (xt = 1) stands for the 
i'th ball being in Uo (Ui). This identifies the set of microstates, which we will 
call r (it corresponds to phase space), with V = {0, 1} N , a discrete space of of 2 N 
elements. It can be further identified with the set of all functions {!,-•• , N} — > 
{0, 1}, i I— > Xt. Mathematically speaking, the space V carries a natural measure, \i r , 
given by associating to each subset A C V its cardinality: p. r (A) = |A|. We now 
make the physical assumption, that the probability measure (normalized measure) 
~v r := 2~ N (j. r gives the correct physical probabilities. Note that this is a statement 
about the dynamics, which here my be expressed by saying, that in the course of the 
dynamics of the system, all microstates are reached equally often on time average. 

Physical observables correspond to functions r — > R. We call the set of such 
functions 0. Conversely, it is generally impossible to associate a physically realiz- 
able observable to any element in 0. Let {Oi , . . . , O n } =: re C the physically 
realizable ones l , which we can combine into a single n component observable 
O re G n . If O re : r — > R n is injective, the state is determined by the value of O re . 
In case of thermodynamical systems it is essential to be far away from injectivity, 
in the sense that a given value a G M n should have a sufficiently large pre-image 
Oj^, 1 (a) C F. The coarse-grained of macroscopic state space in then given by the 
image CI C R n of the realized observables O re - To every macrostate a G CI cor- 
responds a set of microstates: F a := O^ 1 (a) C V. The latter form a partition of V: 

r a n r p = if a / p and u aea r« = r. 

The realized observable for the urn-model is given by the number of balls in 
Ui , that is, Ore = Hili Its range is the set CI = {0, 1 , . . . , N} of macrostates, 
which contains N + 1 elements. The macrostates are denoted by z. To z there 
corresponds the set V z of (j^j microstates. The probability measure "v r induces 
so-called a priori probabilities for macrostates z: 

w ap (z)=vr z )=2- N (^). CD 

LetX : O — > R be the random variable z i— > X(z) = z. Its expectation value E and 
'The subscript 're' can be read as abbreviation for 'realized' or 'relevant'. 
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standard deviation S with respect to the a priori distribution (Q) are given by 2 



E(X,ap) = j, (2) 



S(X,ap) = Y" (3) 

The system has a Markoffian random evolution, which is defined as follows: 
at every discrete lying time t|, where i = {0, 1 , 2, • • • } with tj > tt for ) > i, a 
random generator picks a number n in the interval 1 < n < N. Subsequently 
the ball with number n changes the urn. There are two possibilities: The ball 
with number n has been in urn Uo so that the change of macrostate is given by 
z — > z + 1 . Alternatively, the ball has been in Ui and the change of macrostate is 
given by z — > z — 1 . The conditional probabilities, W(z ± 1 ; ti+i |z; tt), that given 
the state z at time tt the evolution will yield the state z ± 1 at time t i+ i are given 
by 

N — z 

W(z+ 1;ti +1 |z;ti) = =: W ret (z+1|z), (4) 

W(z-1;t i+1 |z;tt) = ^ =: W rct (z-1|z). (5) 

Since these are independent of time, we can suppress the arguments t^. We just 
have to keep in mind, that the left entry, z ± 1 is one time step after than z, that is, 
the probabilities are past-conditioned or retarded. We indicate this by writing W lct . 

Let W(z; t|) denote some chosen absolute probability for the state to be z at 
time ti and Wt : z — > W(z;tt) the probability distribution at time tt. The dy- 
namics described above will now induce a dynamical law, Wt — > Wt + i , on such 
distributions, given by 

W(z;t i+1 ) = W(z;tt +1 |z+l;tt)W(z+l;tt) 

+ W(z;t i+1 |z-l;tt)W(z-l;tt) (6) 

7-1-1 N— z+1 

= _r_W(z+1;t i ) + j^Wfz-liti), (7) 



E R„- r »i,C)- r »N M L(VK 

z=1 v 7 m = v ' 

E(X 2 -X;ap)=2-i Z (z-l)f N )=2- N N(N-1) N f 
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whose Markoffian character is obvious. To be sure, Wt, i > 0, will depend on 
the initial distribution Wq. This dependence will be essential if Wo is far from 
equilibrium and the number of time steps i not much larger than the number N of 
balls. Conversely, one expects that for W| will approach an equilibrium distribution 
W stat for i 3> N, where W stat is independent of Wo- Its uniqueness is shown by 

Theorem 1. A distribution W stat which is stationary under (0 is uniquely given by 

Proof. We show, that W stat can be uniquely determined from Q. To this end, we 
assume a time independent distribution W sla , and write in the form 

W stat (z + 1 ) = — - W stat (z) — W stat (z - 1 ). (8) 

z + 1 z + 1 

Since W stat (— 1 ) = we have for z = that W stat (l ) = NW M (0), hence recursively 
W stat (2) = ±N(N - l)W stat (0) and W stat (3) = £N(N - 1)(N - 2)W stat (0). By 

induction we get the general formula W stat (z) = (^j W stat (0). Indeed, inserting 

this expression for z and z — 1 into the right hand side of (7), we obtain 

N /N\ N — z + 1 / N 



w stat (z + r 



z+1 \ z / z+1 V z — 1 



W stat (0) 



N | N -„. (H 
z+1 ! 



N 
z+1 



W stat (0). (9) 



The value of W stat (0) is finally determined by the normalization condition: 

N 



z=0 z=0 ^ Z ' 



(10) 
□ 



2.1 Future-conditioned probabilities and Bayes' rule 

Given a probability space and a set of events, {Ai , . . . , A n }, which is 1.) complete, 
i.e. Ai U- • -UA n = 1 (here 1 denotes the certain event), and 2.) mutually exclusive, 
i.e. i ^ ) => A| n Aj = (here denotes the impossible event). The probability of 
an event B then obeys Bayes' rule 3 : W(B) = 2^k=i w ( B l A k)W(A k ). This is just 

3 We deliberately avoid to call it Bayes' theorem. 
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what we used in ©. This rule now allows us to deduce the inversely conditioned 
probabilities: 

w fA IP) - W(B|A k )W(A k ) 

W(A k |B)- LLiW(B|Ai)w(Ai) . (11) 

We now identify the At with the N + 1 events (z';ti) at the fixed time tt, where 
z' = 0,...,N + 1, and A k with the special event (z ± 1 ; tt). Further we identify 
the event B with (z; )> i- e - with the occurrence of z at the later time tt + i . Then 
we obtain: 



W(z;t i+1 |z±1;t i )W(z±1;t i ) 
W(z;t i+1 |z±l;t i )W(z±l;t i ) 



W(z;t iH 



(13) 



Hence, given Wt, a formal application of Bayes' rule allows us to express the future 
conditioned ('advanced') probabilities in terms of the past conditioned ('retarded') 
ones. In our case we think of the latter ones as given by (1415 i . Hence we obtain 
the conditioned probability for (z ± 1 ; tt), given that at the later time tt + i the state 
will z occur: 



W(z + 1;tQ 
W(z+1;tO + *^jtlW(z-1;tO 



W(z+1;ti|z;t i+1 ) = - , H N -z+lV,. 7TT- ( 14 ) 



W(z-1;ti|z;t i+1 ) = — W[Z Jj^ (15) 

W(z-1;ti) + ^^±L_W(z+ 1;t t ) 

2.2 Flow equilibrium 

The condition for having flow equilibrium for the pair of times tt, tt+i reads 

W(z ± 1 ; t i+1 |z; tt) W(z; tt) = W(z; t i+1 |z ± 1 ; t t ) W(z ± 1 ; t t ) . (16) 

It already implies Wt = W ap , since PTBl give 4 W(z + 1 ; tt) = W(z; t t ) which 
leads to W(z;tt) = (™) W(0;tt). Since 1 = £ z W(z;tt) we have W(0; tt) = 



4 Without using 14151 one gets 

W(z±1;t l+l |z;tt)W(z;t t ) = W(z;t t+1 |z ± 1;t t )W(z ± 1;t t ) 

= W(z±1;ti|z;t t+1 )W(z;t t+l ) (17) 

where the last equality is the identity W(a|b)W(b) = W(b|a)W(a). The local (in time) condition 
of flow equilibrium is therefore equivalent to (cf. l 191 

W(z±1;t t+ i|z;t t ) _ W(z;t t+1 ) 

W(z±1;t t |z;t i+1 ) W(z;ti) ' 1 ' 



6 



2 N . Using Theorem 1, we conclude that flow equilibrium at ti implies Wj = W ap 
for ) > i. 

2.3 Time-reversal invariance 

To be distinguished from flow equilibrium is time-reversal invariance. The latter is 
given by the following equality of past- and future-conditioned probabilities: 

W(z±l;ti+,|z;ti) = W(z±l;t,|z;t 1+1 ) (19) 

IP W (z itl+ll z ± ,;t t )^™, (20) 

W(z;t i+1 ) 

H W(z;t i+1 ) = £±lw(z+l;ti) (21) 

N — z 

TM — z + 1 

= — W(z-l;ti). (22) 

z 

It is interesting to note that the condition of time-reversal invariance is weaker 
that that of flow equilibrium. The former is implied, but does not itself imply 
the equilibrium distribution. Let us explain this in more detail: Equations d2"Tl 
l22l imply ©, since x (ED + |j X © = 0. Hence (12 11221) are stable 
under time evolution 0. Conversely, <l2H22b is implied by and the following 
equation, expressing the equality of the right hand sides of d2TT i and d22l : 

W(z + 1 ; ti) = — ^- W(z - 1 ; t t ) . (23) 

z + I z 

Indeed, eliminating W(z + 1 ; t|) in using d23l . one gets 

W(z; t i+1 ) = N ~ Z+1 W(z - 1 ; tt) ( = 8) ±±1 W(z + 1 ; tO , (24) 
z N — z 

hence (1211122b . Time-reversal invariance for future times is therefore equivalent to 
the 'constraint' d23l for the initial condition. It allows for a one-parameter family 
of solutions, since it determines Wi for given p := W(0;tt) and q := W(1;ti). 

Indeed, in analogy to the proof of Theorem 1 one gets Wi(z) = p ^ ^ J for z even 



and Wi(z) = £ j for z odd. Since ^ ™ j = £ z=odd ^ £ j = 2 

the normalization condition leads to 1 = 2 N_1 (p + A) => q = N(2~' N_1 ' — p). 
This shows that p 6 [0,2 _ ( N_1 )] faithfully parameterizes all distributions obeying 
d23t . One should note that solutions to (I23t are closed under convex sums. In this 
way one sees, that the obtained distributions are the convex sum Wi = pW e + 
(1 -p)W° of the 'even' distribution, W e (z) = (1 - (^) Z ^)2- N (™) and 'odd' 
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distribution, W°(z) = (1 - (-1) Z )2- N ( "J. Solutions to (El form a closed 

interval within the simplex A N , which connects the point W e in the ^-sub-simplex 
A 13...N-1 with the point W o on the + 1 )-sub-simplex A 24 ' N . If we call this 
interval A*, we have 

Theorem 2. The set A* C W is invariant under time evolution. The future de- 
velopment using W(z; t| + i |z'; t-jj and the past development using W(z; tjz'; t| + i ) 
coincide. 5 

It is of central importance to note that the past development is, mathematically 
speaking, not the inverse operation to the future development. The reason being 
precisely that such a change in the direction of development is linked with a change 
from retarded to advanced conditionings in the probabilities. 



3 General Consequences 

In the following we want to restrict to the equilibrium condition. In this case the 
future-conditioned probabilities are independent of the tt and we can write W(z ± 
1 ; tt|z; t| + i ) =: W av (z ± 1 |z). Hence we have: 

W rct (z + 1 |z) = W av (z + 1 |z) = , (27) 

W ret (z-l|z) = W av (z-1|z) = ^, (28) 

from which statements 1 and 2 made in the Introduction follow. Indeed, let z = 
z(ti) > N/2, then the probabilities that at time t-t_i or tt + i the state was or will 
be z — 1 is, in both cases, given by The probability for the state z + 1 at time 
t|_i or t| + i is Now, every change of state in the direction of the equilibrium 
distribution leads to an increase in entropy (see below). Hence the probability of 
having a higher entropy at tt i or t^+i is times that of having a lower entropy. 
If z = z(tt) < N/2 we have to use the inverse of that. 

5 Explicitly one can see the preservation of J23> under time evolution as follows: Given that 
the initial distribution Wt satisfies J23L the development is equivalent to 1211221 . Hence 

W(z-1;tt) = N _ Z z+1 W(z;t t+1 ) (25) 

W(z+l;ti) = ,, Z + 2 1 W(z + 2;t t+1 ] (26) 
N — z — I 

which allows to rewrite I23i for Wi into 1231 for Wt+ 1 . 
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3.1 Boltzmann Entropy 



Boltzmann Entropy Sb is a function Sb : Q — > M. We stress that since O is 
defined only after a choice of coarse graining (i.e. a choice of re ) has been made, 
Boltzmann Entropy, too, must be understood as relative to that choice. 6 The value 
Sb(z) in the macro state z is defined by Sb(z) := In M-r(fz)- For the urn model 
this corresponds to the logarithm of microstates that correspond to the macrostate 
z. In what follows it will sometimes be more convenient to label the macrostate 
not by z G [0, N], but rather by a parameter cr 6 [— 1 , 1] of range independent of 
N . Let the latter be defined by z = ^ ( 1 + a) . If we assume that N,z, (N — z) > 1 
and approximate In N! = N In N — N + 0(ln N) (Stirling formula), we obtain the 
following expression for the Boltzmann entropy: 

S B (z) = N In N — zlnz — (N — z) ln(N — z), (29) 



Sb(o-) 



N 
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In 



4 



+ crln 



1 +g" 
1 -o- 



(30) 



It obeys Sb(c) = Sb(— cr) = Sb(|c|), which just corresponds to the invariance of 
the first expression under z i— > N — z. Considered as function of \a\, Sb : [0, 1] — > 
[ln2 N ,0] is strictly monotonically decreasing. That Sb(c = 1) = is best seen 
in the limit z — > N of d29l . Despite Stirling's approximation this value is, in fact, 
exact, as one easily infers from the fact that z = N just corresponds to a single 
microstate. In contrast, the given value at cr = is only approximately valid. 



3.2 Consequences 1 and 2 

The quantitative form of Consequences 1 and 2 are given by the solution to the 
following exercises: Let the state at time t\ be z = z(tjj. Calculate the conditioned 
probabilities for z{t\) i) a local maximum, ii) a local minimum, iii) 

(i) z(tt) being a local maximum, 



6 This apparently non objective character of entropy is often complained about. But this criticism 
is based on a misconception, since the term thermodynamical system is not defined without a choice 
for rc . This is no different in phenomenological thermodynamics, where the choice of 'work degrees 
of freedom', {y 1 }, (the relevant or controlled degrees of freedom) is part of the definition of 'system'. 
Only after they have been specified can one define the differential one-form of heat, 5Q, as the 
difference between the differential of total energy, dE, and the differential one-form of reversible 
work, 5A := fidy 1 . (Here 5 is just meant to indicate that the quantity in question is a one-form, not 
that it is the differential, d, of a function; i.e. d5A 7^ and d5Q ^ in general.) Hence we define 
6Q := dE — 5 A. Roughly speaking, one may say that 'heat' is the energy that is localized in the 
non-relevant (not controlled) degrees of freedom. 
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(ii) z(ti) being a local minimum, 



(iii) z[ti) lying on a segment of positive slope, 



(iii) z(ti) lying on a segment of negative slope. 

Let the corresponding probabilities be W raax (z), W rain (z), Wj-(z), and W|(z) re- 
spectively. These are each given by the product of one past and one future condi- 
tioned probability. This being a result of the Markoffian character of the dynamics, 
i.e. that for given (z, tt) the dynamical evolution (z;t-i) — > (z ± 1;ti+i) is inde- 
pendent of z(ti_i ). Using d27!28l> we obtain: 

W max (z) = W av (z-1|z)W ret (z-l|z) = f^-) 2 , (31) 



N 



Z\2 



W mm (z) = W av (z+l|z)W ret (z+l|z) = (J-^J , (32) 
W T (z) = W av (z-1|z)W ret (z+l|z) = £ (l - £) , (33) 
VVl(z) = W av (z+1|z)W ret (z-l|z) = £ (l - £) . (34) 

For z/N > i (z/N < i) the probability W max (W mi „) dominates the other ones. Ex- 
pressed in terms of a the ratios of probabilities are given by the simple expressions: 

W raax (cx) : W mm (o) : W T (cr) : W^cr) = : : 1 : 1. (35) 

1 — a I + cr 

In the limiting case of infinitely many tt we get that the state z is z 2 / (N 2 — z 2 ) = 
(1 + a) 2 /2(l — a) times more often a maximum than any other of the remaining 
three possibilities. 

We also note an expression for the expected recurrence time, T(z), for the state 
z. 7 It is derived in [ 5 1 (there formula (66)). If the draws from the urns have constant 
time separation At one has 

T(2| = ^ (36) 

and hence a connection between mean recurrence time and entropy: 

"2 N Atl 



S(zl =ln 



Tfz) 



(37) 



7 Note that we talk about recurrence in the space O of macrostates ('coarse grained' states), not 
in the space F of microstates. 



10 



Reference |5] also shows the recurrence theorem, which for discrete state 
spaces asserts the recurrence of each state with certainty. More precisely: let 
W'(z';tt +n |z;ti) be the probability that for given state z at time tt the state z' 
occurs at time t| +n for the, first time after t\ (this distinguishes W from W), then 



3.3 Coarse grained Gibbs entropy and the H-theorem 

We recall that the Gibbs entropy S g lives on the space of probability distributions 
(i.e. normed measures) on V and is hence independent of the choice of re . In con- 
trast, the coarse grained Gibbs entropy, S G 9 , lives on the probability distributions 
on O, S G 9 : W — > R, and therefore depends on re . Since the former does serve, 
after all, as a re independent definition of entropy (even though, thermodynam- 
ically speaking, not a very useful one), we distinguish the latter explicitly by the 
superscript 'eg'. If at all, it is S G 9 and not Sq that fhermodynamically can we be 
compared to Sb- The function S G 9 is given by 



The structure of this expression is highlighted by means of the generalized H- 
theorem, which we explain below. 8 Since the two entropies Sb and S G 9 are defined 
on different spaces, O and W, it is not immediately clear how to compare them. 
To do this, we would have to agree on what value of S G 9 we should compare with 
Sb(z), i.e. what argument W G W should correspond to z £ Q. A natural 
candidate is the distribution centered at z, that is, W(z') = 6 z (z'), which is 1 for 
z' = z and zero otherwise. From d38l we then obtain 



Let us now turn to the generalized H-theorem. Let <P : R — > R be a con- 
vex function. Then for any finite family m := {x-|, . . . ,x n } of not necessarily 
pairwise distinct points in R we have the following inequality O[^_ i 0iiXi) < 

cq® (xt) V «i S R>o with ^^cxi = 1, where equality holds iff there is no 
index pair i, j , such that ^ Xj and ctj. • (Xj ^ 0. In the latter case the convex sum 

8 Usually this expression is called the relative entropy [of W relative to W sllll ]. As [absolute] 
entropy of W one then understands the expression — ^ z W(z) In W(z). The H-theorem would be 
valid for the latter only if the constant distribution (in our case W(z) = 1 /(N + 1 }) is an equilibrium 
distribution, which is not true for the urn model. 



Ln=1 W 



(z;ti+ n |z;ti) = 1. 




(38) 



S c G 9 (6z) = S B (z)-Nln2. 



(39) 
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is called trivial. We now define a function H : W x W — > M through 



N 



H(W,W) := ^ W'(z)O) 



z=0 



W(z] 
W'fz) 



(40) 



Consider a time evolution Wj. i-» W i+ i, W i+ i(z) := £ t W(z|z')Wt(z'), where 
clearly W(z|z') > and Y_ z W(z|z') = 1 . We also assume that no row of the ma- 
trix W(z|z') just contains zeros (which would mean that the state labelled by the 
corresponding row number is impossible to reach). We call such time evolutions 
and the corresponding matrices non-degenerate. In what follows those distribu- 

o 

tions W 6 W for which W(z) > Vz, i.e. from the interior W C W, will play a 
special role. We call them generic. The condition on W(z|z') to be non-degenerate 
then ensures that the evolution leaves the set of generic distributions invariant. Af- 
ter these preparations we formulate 

Theorem 3 (generalized H-theorem). Let W( be generic and the time evolution 
non-degenerate; then H(W i+ i , W i ' +1 ) < H(Wt,W(). 

Proof. (Adaptation of the proof of theorem 3 in J6j for the discrete case.) We 
define a new matrix V(z, |z') := [W| +1 (z)] _1 W(z|z')W i '(z / ), which generates the 
time evolution for Wi(z)/W i '(z) and obeys V(z|z') = 1. It follows: 



N 



H(W i+1 ,W( 4 



^W/ +1 (z)0 

z=0 
N 



W i+1 (z) 



z=l 
N N 



N 



z z 



_z'=0 

< ^^W/ +1 (z)V(z[z')0 



z'=0 z=0 
N 



/. Wits') 
W/(z') 

WtlzQ 
W/(z') 



z'=0 



Wt(z') 
W/(z') 



H(Wi,W/) 



(41) 

(42) 

(43) 

(44) 
(45) 



Equality in d43l holds, iff the convex sum in the square brackets of (|42T > is trivial. 

□ 

Picking a stationary distribution for W', which in our case is the unique dis- 
tribution W stat , then H is a function of just one argument which does not increase 
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in time. Taking in addition the special convex function ®(x) = xln(x), then we 
obtain with S^ 9 := — H the above mentioned entropy formula. 

Let from now on O be as just mentioned. Then we have, due to ln(x) > 1 — x -1 , 
with equality iff x = 1 : 



N 



H(W,W) = ^W(z)ln 



z=0 



W(z) 
W'(z) 



N 

>X(W( 

z=0 



<^ W(z) = W'(z) Vz. 



z|-W'(z)) =0, (46) 
(47) 



Let us denote by a distance function on a set M any function d : M x M — * 
M>o, such that d[x,y) = d(y,x) and d(x,y) = § x = y. (This is more 
general than a metric, which in addition must satisfy the triangle inequality.) A 
map t : M — > M is called non-expanding with respect to d, iff d(t:(x), t[y)) < 
d(x,y)Vx,y G M. We have 

Theorem 4. D : VV x VV — > M, D(W, W) := H(W, W) + H(W, W) w a 
distance function with respect to which every proper non-degenerate time evolution 
is non-expanding. 

Proof. Symmetry is clear and d47l immediately implies D(W, W) > with 
equality iff W = W, as follows from the separate positivity of each summand. 
Likewise d45l holds for each summand, so that no distance increases. □ 



4 Thermodynamic limit and deterministic dynamics 

In this section we wish to show how to get a deterministic evolution for random 
variables in the limit N — > oo. To this end we first consider the discrete, future 
directed time evolution of the expectation value of the random variable X(z) = z. 
We have 



N 



N N 



E(X,t i+1 ) = ^z'W i+1 (z') = ^^z'W M (z'|z)W i (z) (48) 



z'=0 
N 

L 

z=0 



z'=0 z=0 

N — z z 
(z+1) __ + (z _ 1) _ 



W t (z) 



1 + 1 1- - ) E(X,tt) 



(49) 
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In the same way we get 



E(X 2 ,t i+1 ) = Y_ 



z=0 

,2 



Wi(z) 



1+2E(X,t i ) + (1-4/N)E(X z ,t i ) (50) 



V(X,t i+1 ) = E(X 2 ,t i+1 )-E 2 (X,t i+1 ) 

= (1 -4/N)V(X ) t i ) + ^E(X,t i )-^E 2 (X,t i ) (51) 

By the evolution being 'future directed' one means that W rel and not Wav are used 
in the evolution equations, as explicitly shown in (148b . In this case one also speaks 
of 'forward-directed evolution'. 

In order to carry out the limit N — > oo we use the new random variable L : 
CI — > a, where a = ^ — 1 as above; hence X = + L). Simple replacement 
yields 



E(Z,ti +1 ) = (1-2/N)E(I,ti) (52) 

4 

N 1 



V(I,t i+1 ) = (l-4/N)V(I ) t i ) + ^(1-E 2 (I,t i )). (53) 



In order to have a seizable fraction of balls moved within a macroscopic time span 
T, we have to appropriately decrease the time steps At := t^ + i — t^ with growing 
N, e.g. like At = j^x, where t is some positive real constant. Its meaning is to be 
the time span, in which N/2 balls change urns. Now we can take the limit N — > oo 
of £3 and ©, 

A E (I,t) = ~E(I,t) E(I,t) = E e X p ( ~ (t ~ tl] ) , (54) 

A V (i,t) = -^V(I,t) V(I,t) = V exp ( ~ 2(t ~ t2) ) , (55) 

where Eq, Vq, ti , t2 are independent constants. These equations tell us, that 1) the 
expectation value approaches the equilibrium value Z = exponentially fast in the 
future, and 2) it does so with exponentially decaying standard deviation. The half 
mean time of both quantities is the time for N /2 draws. 

According to the discussions in previous sections it is now clear, that in case 
of equilibrium identical formulae would have emerged if W av instead of W re , had 
been used, for then W av = W ret . Most importantly to note is, that the backward 
evolution is not obtained by taking the forward evolution and replacing in it t i— > 
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— t. The origin of this difference is the fact already emphasized before (following 
Theorem 2), that W av (z;z') is not the inverse matrix to W lct (z;z'), but rather the 
matrix computed according to Bayes' rule. 

5 Appendix 

In this Appendix we collect some elementary notions of probability theory, adapted 
to our specific example. 

The space of elementary events 9 is O = {0, 1 , . . . , N}. By 

X: = {X:O^M} (56) 

W: = {W:D^IR>ol ^W(z) = 1} (57) 

we denote the sets of random variables and probability distributions respectively, 
where W C X. The map X -» M N+1 , X i-» (X(0), X(1 ), • • • ,X(N)) defines a 
bijection which allows us to identify X with M N+1 . This identifies W with the 
N -simplex 

A N := {(W(0),--- ,W(N)) G M N+1 | W(z) > 0, ^W(z) =1} C M N+1 . 

(58) 

Its boundary, 3A N , is the union of all (N — K)-simplices: 

A ir-i K := {(W(0),... ,W(N)) G A N | = Wfa) = ••• = W(i K )} (59) 

for all K. Its interior is W := W - 3W, so that We W§ W(z) ^ OVz. 

Expectation value E, variance V, and standard deviation S are functions X x 
W — > R, defined as follows: 

E : X x W -> E, E(X, W) := ^ X(z)W(z) (60) 

zea 

V: X x W -» M> , V(X,W) := E((X - (X)) 2 , W) = E(X 2 , W) - E 2 (X, W) 

(61) 

S : X x W -» M> , S(X, W) := ^V(X,W) (62) 

where in d6TT l (X) simply denotes the constant function (X) : z i— > E(X, W), and 
E 2 (X,W) := [E(X,W)] 2 . In the main text we also write E(X, s) if the symbol s 
uniquely labels a point in W, like s = ap for the a priori distribution Q, or E(X, tt) 
for the distribution W| at time t^. 

9 'Elementary' is merely to be understood as mathematical standard terminology, not in any phys- 
ical sense. For example, in the urn model, CI is obtained after coarse graining form the space of 
physically 'elementary' events. 
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